Incorporating Trustiness and Collective Synonym/Contrastive Evidence into Taxonomy Construction

نویسندگان

  • Anh Tuan Luu
  • Jung-jae Kim
  • See-Kiong Ng
چکیده

Taxonomy plays an important role in many applications by organizing domain knowledge into a hierarchy of is-a relations between terms. Previous works on the taxonomic relation identification from text corpora lack in two aspects: 1) They do not consider the trustiness of individual source texts, which is important to filter out incorrect relations from unreliable sources. 2) They also do not consider collective evidence from synonyms and contrastive terms, where synonyms may provide additional supports to taxonomic relations, while contrastive terms may contradict them. In this paper, we present a method of taxonomic relation identification that incorporates the trustiness of source texts measured with such techniques as PageRank and knowledge-based trust, and the collective evidence of synonyms and contrastive terms identified by linguistic pattern matching and machine learning. The experimental results show that the proposed features can consistently improve performance up to 4%-10% of F-measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward critical contrastive rhetoric

A traditional approach to contrastive rhetoric has emphasized cultural difference in rhetorical patterns among various languages. Despite its laudable pedagogical intentions to raise teachers’ and students’ cultural and rhetorical awareness in second language writing, traditional contrastive rhetoric has perpetuated static binaries between English and other languages and viewed students as cult...

متن کامل

Some notes on taxonomy and diversity of Onosma with emphasis on important evidence and complex groups in Flora Iranica

Onosma L. as a rich taxa in Boraginaceae including about 150–180 species, centered mainly in Irano-Turanian region. The genus faced with several systematic complexities lead to many identification problems as well as morphological polymorphism. Several authors have used setae characteristics in Onosma as the most important diagnostic evidence in delimitation and classification of species in add...

متن کامل

PATTY: A Taxonomy of Relational Patterns with Semantic Types

This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY t...

متن کامل

Iterative TempoWordNet

TempoWordNet (TWn) has recently been proposed as an extension of WordNet, where each synset is augmented with its temporal connotation: past, present, future or atemporal. However, recent uses of TWn show contrastive results and motivate the construction of a more reliable resource. For that purpose, we propose an iterative strategy that temporally extends glosses based on TWn to obtain a poten...

متن کامل

Practice in Synonym Extraction at Large Scale

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale applications. Previous studies in synonym extraction are most limited to small scale datasets. In this paper, we build a large dataset with 3.4 million synonym/nonsynony...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015